Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: split allow_list_answers column in analytics_summary view #2926

Merged
merged 2 commits into from
Mar 27, 2024

Conversation

Mike-Heneghan
Copy link
Contributor

@Mike-Heneghan Mike-Heneghan commented Mar 26, 2024

What

  • Add separate columns for each of the possible ALLOW_LIST variables
  • Add the created_at for the analytics_log rather than just the created_at for the analytics session

Why

  • It was found that the allow_list_answers with the new data structure was difficult to work with on Metabase

Follow Up

  • Noticed that the allow_list_answer data structure is inconsistent i.e. {key: value} rather than {key: [value]} Screenshot 2024-03-26 at 12 59 07

  • Noticed that the null values are no longer being filtered out:

    const filteredAnswers = answerValues?.filter(Boolean);
    which leads to [null] surprisingly often? Screenshot 2024-03-26 at 13 01 39

  • Check that the value of allow list as an array is usable on Metabase: Screenshot 2024-03-26 at 13 03 53

  • Remove the nodeFn column from analytics_logs as it's not reliable with the previous changes to the way we gather allow_list answers i.e. it naively stores the nodeToTrack?.data?.fn || nodeToTrack?.data?.val and missed when the passport variable is defined by the breadcrumbs. Screenshot 2024-03-26 at 13 15 28

  • Align the data structure for the allow_list between analytics_logs and lowcal_sessions i.e. [{key: [value]}], vs {key: [value]} respectively.

    Screenshot 2024-03-26 at 13 01 39 Screenshot 2024-03-26 at 15 14 29

@Mike-Heneghan Mike-Heneghan self-assigned this Mar 26, 2024
@Mike-Heneghan Mike-Heneghan force-pushed the mh/allow-list-answers-columns-analytics-summary branch from 642d69e to c30abd2 Compare March 26, 2024 13:08
@Mike-Heneghan Mike-Heneghan force-pushed the mh/allow-list-answers-columns-analytics-summary branch from c30abd2 to e375827 Compare March 26, 2024 13:10
) as time_spent_on_analytics_session_minutes,
node_id,
al.allow_list_answers as allow_list_answers,
allow_list_answer_elements->>'proposal.projectType' AS proposal_project_type,
Copy link
Contributor Author

@Mike-Heneghan Mike-Heneghan Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was a little unsure about the format of the new column names.

I went for camelcase snake_case as I thought it was more consistent with the rest of the table but I'm happy to change to match the exact variable name if preferred?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping these as snake_case is correct as it's for a table / view. Metabase should automatically convert this to Proposal Project Type I think 👍

@Mike-Heneghan Mike-Heneghan marked this pull request as ready for review March 26, 2024 13:16
@Mike-Heneghan Mike-Heneghan requested review from a team and zz-hh-aa March 26, 2024 13:18
Copy link

github-actions bot commented Mar 26, 2024

Removed vultr server and associated DNS entries

Copy link
Contributor

@DafyddLlyr DafyddLlyr left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fantastic - thanks for picking this up.

A few comments raised by PR -

  1. [null] values

This should have been handled by this line https://github.com/theopensystemslab/planx-new/blob/main/editor.planx.uk/src/pages/FlowEditor/lib/analyticsProvider.tsx#L535 but maybe it's not working as intended?

  1. { key: value } vs { key: [value] }

Yep this is also a bit of an awkward one - values in arrays can contain multiples (e.g. project type) and strings are a single value (e.g. draw boundary option). I'm not sure how Metabase will handle these? We could concatenate arrays into a string (and sort them so they're consistent) but I would think an array is more usable? Not too sure here - maybe a question for @zz-hh-aa . We can format it however we want is the truth 👍

  1. node_fn

Agreed - this column is probably redundant now we have allow list answers. I vote we remove it.

  1. Align the data structure for the allow_list between analytics_logs and lowcal_sessions i.e. [{key: [value]}], vs {key: [value]} respectively.

Yep - we should do this!

Comment on lines +23 to +26
/**
* If appending to ALLOW_LIST please also update the `analytics_summary` view to
* extract it into it's own column.
*/
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perfect - I had this exact comment in mind also!

) as time_spent_on_analytics_session_minutes,
node_id,
al.allow_list_answers as allow_list_answers,
allow_list_answer_elements->>'proposal.projectType' AS proposal_project_type,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think keeping these as snake_case is correct as it's for a table / view. Metabase should automatically convert this to Proposal Project Type I think 👍

) / 60 as numeric (10, 1)
) as time_spent_on_analytics_session_minutes,
node_id,
al.allow_list_answers as allow_list_answers,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure we need to keep the current column - what do you think?

Copy link
Contributor Author

@Mike-Heneghan Mike-Heneghan Mar 26, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I don't think it adds much value beyond being a reference to check that data is being extracted properly etc.

I'm tempted to remove it 👀

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Counter to my previous point I think it could also be handy to check whether a new value in ALLOW_LIST has been missed in the view 🤔

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yep this is a good enough reason for me to keep it - it can just not be used in Metabase directly 👍

left join analytics_logs al on a.id = al.analytics_id
left join flows f on a.flow_id = f.id
left join teams t on t.id = f.team_id
left join lateral jsonb_array_elements(al.allow_list_answers) AS allow_list_answer_elements ON true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice 👌

@Mike-Heneghan
Copy link
Contributor Author

@Mike-Heneghan
Copy link
Contributor Author

Fantastic - thanks for picking this up.

A few comments raised by PR -

  1. [null] values

This should have been handled by this line https://github.com/theopensystemslab/planx-new/blob/main/editor.planx.uk/src/pages/FlowEditor/lib/analyticsProvider.tsx#L535 but maybe it's not working as intended?

  1. { key: value } vs { key: [value] }

Yep this is also a bit of an awkward one - values in arrays can contain multiples (e.g. project type) and strings are a single value (e.g. draw boundary option). I'm not sure how Metabase will handle these? We could concatenate arrays into a string (and sort them so they're consistent) but I would think an array is more usable? Not too sure here - maybe a question for @zz-hh-aa . We can format it however we want is the truth 👍

  1. node_fn

Agreed - this column is probably redundant now we have allow list answers. I vote we remove it.

  1. Align the data structure for the allow_list between analytics_logs and lowcal_sessions i.e. [{key: [value]}], vs {key: [value]} respectively.

Yep - we should do this!

Thanks for the thorough review and responses 🙌 @DafyddLlyr

@Mike-Heneghan
Copy link
Contributor Author

  1. [null] values

This should have been handled by this line https://github.com/theopensystemslab/planx-new/blob/main/editor.planx.uk/src/pages/FlowEditor/lib/analyticsProvider.tsx#L535 but maybe it's not working as intended?

Yeah you're right! I misread the updated code and missed this. It's an odd one but I double checked and these are new records so it's not a legacy issue it seems to be live in production at the moment. Also it's not specific to one allow_list_answer key: https://metabase.editor.planx.uk/question/353-null-allow-list-value

This might be worth splitting into it's own PR and including a data cleanup migration as I imagine the null values might be a pain in Metabase but we can check that after this I guess.

@Mike-Heneghan
Copy link
Contributor Author

  1. { key: value } vs { key: [value] }

Yep this is also a bit of an awkward one - values in arrays can contain multiples (e.g. project type) and strings are a single value (e.g. draw boundary option). I'm not sure how Metabase will handle these? We could concatenate arrays into a string (and sort them so they're consistent) but I would think an array is more usable? Not too sure here - maybe a question for @zz-hh-aa . We can format it however we want is the truth 👍

I agree an array is probably more useful and Metabase might handle it gracefully 🤞 I incorrectly thought this was something to do with how we were writing the values to the logs rather than it being the data itself as below.

Screenshot 2024-03-26 at 17 14 12

Taking that into account I reckon we maybe go ahead and see how we get on unless you have any objections @zz-hh-aa

@Mike-Heneghan
Copy link
Contributor Author

  1. node_fn
    Agreed - this column is probably redundant now we have allow list answers. I vote we remove it.
  1. Align the data structure for the allow_list between analytics_logs and lowcal_sessions i.e. [{key: [value]}], vs {key: [value]} respectively.
    Yep - we should do this!

Nice one! I think I'll update both of these in a follow up PR as I think they're close enough logically in that the node_fn is moving from it's own column into the the JSONB allow_list_answers.

As part of it I think I'll also ditch the [] and make it a null in the same way that submission_services_summary has it: https://trello.com/c/zKgDcQAK/2860-update-analyticslogs-jsonb-default-values-to-null

That will also have to change how the db view is written 😅 Although the column names shouldn't change so I think it's still worth merging this to unblock but in the background change how we store the data 😌

@zz-hh-aa
Copy link
Collaborator

If I've understood correctly, I would vote in favour of strings--the Metabase GUI gets confused by JSONs but can handle strings, so it might be council-friendlier to go that route. This is a loosely held opinion!

If we go the strings route and later wanted to split the array items up, we could always separate by ,.

@Mike-Heneghan
Copy link
Contributor Author

If I've understood correctly, I would vote in favour of strings--the Metabase GUI gets confused by JSONs but can handle strings, so it might be council-friendlier to go that route. This is a loosely held opinion!

If we go the strings route and later wanted to split the array items up, we could always separate by ,.

Good point @zz-hh-aa the change here would end up with the same data structure as input_errors which we had issues with I think? It would pick up the arrays but then when we tried to apply the filter we'd get error?

Screenshot 2024-03-27 at 10 08 37

I'll have a look at updating this PR to make the column values a string separated by , if there are multiple variables as you suggest 👍

@Mike-Heneghan
Copy link
Contributor Author

If I've understood correctly, I would vote in favour of strings--the Metabase GUI gets confused by JSONs but can handle strings, so it might be council-friendlier to go that route. This is a loosely held opinion!
If we go the strings route and later wanted to split the array items up, we could always separate by ,.

Good point @zz-hh-aa the change here would end up with the same data structure as input_errors which we had issues with I think? It would pick up the arrays but then when we tried to apply the filter we'd get error?

Screenshot 2024-03-27 at 10 08 37 I'll have a look at updating this PR to make the column values a string separated by `,` if there are multiple variables as you suggest 👍

I tried updating this to turn the array into a string separated by columns with the view and it's surprisingly difficult 😬 I think it's because it's trying to do a lot of transformations at once and it's a bit of a nightmare.

I'd propose we make this change then handle it on Metabase with a "Saved Question" written in SQL which should hopefully be able to handle this?

@Mike-Heneghan Mike-Heneghan merged commit 2b2afb2 into main Mar 27, 2024
12 checks passed
@Mike-Heneghan Mike-Heneghan deleted the mh/allow-list-answers-columns-analytics-summary branch March 27, 2024 12:36
@zz-hh-aa
Copy link
Collaborator

I'd propose we make this change then handle it on Metabase with a "Saved Question" written in SQL which should hopefully be able to handle this?

Yep that's a perfect workaround--thanks for looking into it!

@Mike-Heneghan
Copy link
Contributor Author

@Mike-Heneghan
Copy link
Contributor Author

  1. [null] values

This should have been handled by this line https://github.com/theopensystemslab/planx-new/blob/main/editor.planx.uk/src/pages/FlowEditor/lib/analyticsProvider.tsx#L535 but maybe it's not working as intended?

Yeah you're right! I misread the updated code and missed this. It's an odd one but I double checked and these are new records so it's not a legacy issue it seems to be live in production at the moment. Also it's not specific to one allow_list_answer key: https://metabase.editor.planx.uk/question/353-null-allow-list-value

This might be worth splitting into it's own PR and including a data cleanup migration as I imagine the null values might be a pain in Metabase but we can check that after this I guess.

I think the issue was that https://github.com/theopensystemslab/planx-new/blob/main/editor.planx.uk/src/pages/FlowEditor/lib/analyticsProvider.tsx#L535 handles answers returned from getData but not from getAnswers

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants